Exploiting Lists of Names for Named Entity Identification of Financial Institutions from Unstructured Documents
نویسندگان
چکیده
There is a wealth of information about financial systems that is embedded in document collections. In this paper, we focus on a specialized text extraction task for this domain. The objective is to extract mentions of names of financial institutions, or FI names, from financial prospectus documents, and to identify the corresponding real world entities, e.g., by matching against a corpus of such entities. The tasks are Named Entity Recognition (NER) and Entity Resolution (ER); both are well studied in the literature. Our contribution is to develop a rule-based approach that will exploit lists of FI names for both tasks; our solution is labeled Dict-based NER and Rank-based ER. Since the FI names are typically represented by a root, and a suffix that modifies the root, we use these lists of FI names to create specialized root and suffix dictionaries. To evaluate the effectiveness of our specialized solution for extracting FI names, we compare Dict-based NER with a general purpose rule-based NER solution, ORG NER. Our evaluation highlights the benefits and limitations of specialized versus general purpose approaches, and presents additional suggestions for tuning and customization for FI name extraction. To our knowledge, our proposed solutions, Dict-based NER and Rank-based ER, and the root and suffix dictionaries, are the first attempt to exploit specialized knowledge, i.e., lists of FI names, for rule-based NER and ER. CCS Concepts: rInformation systems→ Extraction, transformation and loading; rComputing methodologies→ Information extraction;
منابع مشابه
Named Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملA Hybrid Stepwise Approach for De-identifying Person Names in Clinical Documents
As Electronic Health Records are growing exponentially along with large quantities of unstructured clinical information that could be used for research purposes, protecting patient privacy becomes a challenge that needs to be met. In this paper, we present a novel hybrid system designed to improve the current strategies used for person names de-identification. To overcome this task, our system ...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملScalable, Trie-based Approximate Entity Extraction for Real-Time Financial Transaction Screening
Financial institutions have to screen their transactions to ensure that they are not affiliated with terrorism entities. Developing appropriate solutions to detect such affiliations precisely while avoiding any kind of interruption to large amount of legitimate transactions is essential. In this paper, we present building blocks of a scalable solution that may help financial institutions to bui...
متن کاملPAYMA: A Tagged Corpus of Persian Named Entities
The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1602.04427 شماره
صفحات -
تاریخ انتشار 2016